{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Python绘制电商网站转化漏斗图"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "漏斗图：适用于业务流程环节多的流程分析，通过漏斗各环节业务数据的比较，能够直观地发现问题所在。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "实例：数据来自kaggle网站的\"E-commerce website Funnel analysis\"  \n",
    "地址为：https://www.kaggle.com/aerodinamicc/ecommerce-website-funnel-analysis\n",
    "\n",
    "网站很简单，有四个页面数据：\n",
    "1. home_page_table.csv，首页用户访问数据\n",
    "2. search_page_table.csv，搜索页用户访问数据\n",
    "3. payment_page_table.csv，支付信息页用户访问数据\n",
    "4. payment_confirmation_table.csv，支付成功页用户访问数据\n",
    "5. user_table.csv，用户信息数据\n",
    "\n",
    "目标：绘制转化漏斗，查看是否正常"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. 读取数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 读取5个数据表到df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_home_page = pd.read_csv(\"./datas/ecommerce-website-funnel-analysis/home_page_table.csv\")\n",
    "df_search_page = pd.read_csv(\"./datas/ecommerce-website-funnel-analysis/search_page_table.csv\")\n",
    "df_payment_page = pd.read_csv(\"./datas/ecommerce-website-funnel-analysis/payment_page_table.csv\")\n",
    "df_payment_confirmation_page = pd.read_csv(\"./datas/ecommerce-website-funnel-analysis/payment_confirmation_table.csv\")\n",
    "df_user_table = pd.read_csv(\"./datas/ecommerce-website-funnel-analysis/user_table.csv\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 查看首页数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>page</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>313593</td>\n",
       "      <td>home_page</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>468315</td>\n",
       "      <td>home_page</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>264005</td>\n",
       "      <td>home_page</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   user_id       page\n",
       "0   313593  home_page\n",
       "1   468315  home_page\n",
       "2   264005  home_page"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_home_page.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 查看搜索页数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>page</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>15866</td>\n",
       "      <td>search_page</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>347058</td>\n",
       "      <td>search_page</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>577020</td>\n",
       "      <td>search_page</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   user_id         page\n",
       "0    15866  search_page\n",
       "1   347058  search_page\n",
       "2   577020  search_page"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_search_page.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 查看支付信息页数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>page</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>253019</td>\n",
       "      <td>payment_page</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>310478</td>\n",
       "      <td>payment_page</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>304081</td>\n",
       "      <td>payment_page</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   user_id          page\n",
       "0   253019  payment_page\n",
       "1   310478  payment_page\n",
       "2   304081  payment_page"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_payment_page.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 查看支付成功页数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>page</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>123100</td>\n",
       "      <td>payment_confirmation_page</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>704999</td>\n",
       "      <td>payment_confirmation_page</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>407188</td>\n",
       "      <td>payment_confirmation_page</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   user_id                       page\n",
       "0   123100  payment_confirmation_page\n",
       "1   704999  payment_confirmation_page\n",
       "2   407188  payment_confirmation_page"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_payment_confirmation_page.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 查看用户信息表"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>date</th>\n",
       "      <th>device</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>450007</td>\n",
       "      <td>2015-02-28</td>\n",
       "      <td>Desktop</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>756838</td>\n",
       "      <td>2015-01-13</td>\n",
       "      <td>Desktop</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>568983</td>\n",
       "      <td>2015-04-09</td>\n",
       "      <td>Desktop</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   user_id        date   device     sex\n",
       "0   450007  2015-02-28  Desktop  Female\n",
       "1   756838  2015-01-13  Desktop    Male\n",
       "2   568983  2015-04-09  Desktop    Male"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_user_table.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 查看设备的类型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Desktop    60200\n",
       "Mobile     30200\n",
       "Name: device, dtype: int64"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_user_table[\"device\"].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 查看性别的类型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Male      45325\n",
       "Female    45075\n",
       "Name: sex, dtype: int64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_user_table[\"sex\"].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. 关联5个数据表为一个大表"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_merge = df_user_table\n",
    "\n",
    "for df_inter in [df_home_page, df_search_page, df_payment_page, df_payment_confirmation_page]:\n",
    "    # 每次循环都会忘df_merge中添加新列\n",
    "    df_merge = pd.merge(\n",
    "        left=df_merge, \n",
    "        right=df_inter, \n",
    "        left_on=\"user_id\", \n",
    "        right_on=\"user_id\", \n",
    "        how=\"left\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>date</th>\n",
       "      <th>device</th>\n",
       "      <th>sex</th>\n",
       "      <th>page_x</th>\n",
       "      <th>page_y</th>\n",
       "      <th>page_x</th>\n",
       "      <th>page_y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>450007</td>\n",
       "      <td>2015-02-28</td>\n",
       "      <td>Desktop</td>\n",
       "      <td>Female</td>\n",
       "      <td>home_page</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>756838</td>\n",
       "      <td>2015-01-13</td>\n",
       "      <td>Desktop</td>\n",
       "      <td>Male</td>\n",
       "      <td>home_page</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>568983</td>\n",
       "      <td>2015-04-09</td>\n",
       "      <td>Desktop</td>\n",
       "      <td>Male</td>\n",
       "      <td>home_page</td>\n",
       "      <td>search_page</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>190794</td>\n",
       "      <td>2015-02-18</td>\n",
       "      <td>Desktop</td>\n",
       "      <td>Female</td>\n",
       "      <td>home_page</td>\n",
       "      <td>search_page</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>537909</td>\n",
       "      <td>2015-01-15</td>\n",
       "      <td>Desktop</td>\n",
       "      <td>Male</td>\n",
       "      <td>home_page</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>90395</th>\n",
       "      <td>307667</td>\n",
       "      <td>2015-03-30</td>\n",
       "      <td>Desktop</td>\n",
       "      <td>Female</td>\n",
       "      <td>home_page</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>90396</th>\n",
       "      <td>642989</td>\n",
       "      <td>2015-02-08</td>\n",
       "      <td>Desktop</td>\n",
       "      <td>Female</td>\n",
       "      <td>home_page</td>\n",
       "      <td>search_page</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>90397</th>\n",
       "      <td>659645</td>\n",
       "      <td>2015-04-13</td>\n",
       "      <td>Desktop</td>\n",
       "      <td>Male</td>\n",
       "      <td>home_page</td>\n",
       "      <td>search_page</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>90398</th>\n",
       "      <td>359779</td>\n",
       "      <td>2015-03-23</td>\n",
       "      <td>Desktop</td>\n",
       "      <td>Male</td>\n",
       "      <td>home_page</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>90399</th>\n",
       "      <td>438929</td>\n",
       "      <td>2015-03-26</td>\n",
       "      <td>Mobile</td>\n",
       "      <td>Female</td>\n",
       "      <td>home_page</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>90400 rows × 8 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       user_id        date   device     sex     page_x       page_y page_x  \\\n",
       "0       450007  2015-02-28  Desktop  Female  home_page          NaN    NaN   \n",
       "1       756838  2015-01-13  Desktop    Male  home_page          NaN    NaN   \n",
       "2       568983  2015-04-09  Desktop    Male  home_page  search_page    NaN   \n",
       "3       190794  2015-02-18  Desktop  Female  home_page  search_page    NaN   \n",
       "4       537909  2015-01-15  Desktop    Male  home_page          NaN    NaN   \n",
       "...        ...         ...      ...     ...        ...          ...    ...   \n",
       "90395   307667  2015-03-30  Desktop  Female  home_page          NaN    NaN   \n",
       "90396   642989  2015-02-08  Desktop  Female  home_page  search_page    NaN   \n",
       "90397   659645  2015-04-13  Desktop    Male  home_page  search_page    NaN   \n",
       "90398   359779  2015-03-23  Desktop    Male  home_page          NaN    NaN   \n",
       "90399   438929  2015-03-26   Mobile  Female  home_page          NaN    NaN   \n",
       "\n",
       "      page_y  \n",
       "0        NaN  \n",
       "1        NaN  \n",
       "2        NaN  \n",
       "3        NaN  \n",
       "4        NaN  \n",
       "...      ...  \n",
       "90395    NaN  \n",
       "90396    NaN  \n",
       "90397    NaN  \n",
       "90398    NaN  \n",
       "90399    NaN  \n",
       "\n",
       "[90400 rows x 8 columns]"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_merge"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>date</th>\n",
       "      <th>device</th>\n",
       "      <th>sex</th>\n",
       "      <th>home_page</th>\n",
       "      <th>search_page</th>\n",
       "      <th>payment_page</th>\n",
       "      <th>confirmation_page</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>450007</td>\n",
       "      <td>2015-02-28</td>\n",
       "      <td>Desktop</td>\n",
       "      <td>Female</td>\n",
       "      <td>home_page</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>756838</td>\n",
       "      <td>2015-01-13</td>\n",
       "      <td>Desktop</td>\n",
       "      <td>Male</td>\n",
       "      <td>home_page</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>568983</td>\n",
       "      <td>2015-04-09</td>\n",
       "      <td>Desktop</td>\n",
       "      <td>Male</td>\n",
       "      <td>home_page</td>\n",
       "      <td>search_page</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>190794</td>\n",
       "      <td>2015-02-18</td>\n",
       "      <td>Desktop</td>\n",
       "      <td>Female</td>\n",
       "      <td>home_page</td>\n",
       "      <td>search_page</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>537909</td>\n",
       "      <td>2015-01-15</td>\n",
       "      <td>Desktop</td>\n",
       "      <td>Male</td>\n",
       "      <td>home_page</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>90395</th>\n",
       "      <td>307667</td>\n",
       "      <td>2015-03-30</td>\n",
       "      <td>Desktop</td>\n",
       "      <td>Female</td>\n",
       "      <td>home_page</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>90396</th>\n",
       "      <td>642989</td>\n",
       "      <td>2015-02-08</td>\n",
       "      <td>Desktop</td>\n",
       "      <td>Female</td>\n",
       "      <td>home_page</td>\n",
       "      <td>search_page</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>90397</th>\n",
       "      <td>659645</td>\n",
       "      <td>2015-04-13</td>\n",
       "      <td>Desktop</td>\n",
       "      <td>Male</td>\n",
       "      <td>home_page</td>\n",
       "      <td>search_page</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>90398</th>\n",
       "      <td>359779</td>\n",
       "      <td>2015-03-23</td>\n",
       "      <td>Desktop</td>\n",
       "      <td>Male</td>\n",
       "      <td>home_page</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>90399</th>\n",
       "      <td>438929</td>\n",
       "      <td>2015-03-26</td>\n",
       "      <td>Mobile</td>\n",
       "      <td>Female</td>\n",
       "      <td>home_page</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>90400 rows × 8 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       user_id        date   device     sex  home_page  search_page  \\\n",
       "0       450007  2015-02-28  Desktop  Female  home_page          NaN   \n",
       "1       756838  2015-01-13  Desktop    Male  home_page          NaN   \n",
       "2       568983  2015-04-09  Desktop    Male  home_page  search_page   \n",
       "3       190794  2015-02-18  Desktop  Female  home_page  search_page   \n",
       "4       537909  2015-01-15  Desktop    Male  home_page          NaN   \n",
       "...        ...         ...      ...     ...        ...          ...   \n",
       "90395   307667  2015-03-30  Desktop  Female  home_page          NaN   \n",
       "90396   642989  2015-02-08  Desktop  Female  home_page  search_page   \n",
       "90397   659645  2015-04-13  Desktop    Male  home_page  search_page   \n",
       "90398   359779  2015-03-23  Desktop    Male  home_page          NaN   \n",
       "90399   438929  2015-03-26   Mobile  Female  home_page          NaN   \n",
       "\n",
       "      payment_page confirmation_page  \n",
       "0              NaN               NaN  \n",
       "1              NaN               NaN  \n",
       "2              NaN               NaN  \n",
       "3              NaN               NaN  \n",
       "4              NaN               NaN  \n",
       "...            ...               ...  \n",
       "90395          NaN               NaN  \n",
       "90396          NaN               NaN  \n",
       "90397          NaN               NaN  \n",
       "90398          NaN               NaN  \n",
       "90399          NaN               NaN  \n",
       "\n",
       "[90400 rows x 8 columns]"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_merge.columns = [\n",
    "    \"user_id\", \"date\", \"device\", \"sex\", \n",
    "    \"home_page\", \"search_page\", \"payment_page\", \"confirmation_page\"]\n",
    "df_merge"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(90400, 8)"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_merge.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. 计算每个页面的用户数目\n",
    "\n",
    "目的是给漏斗图填充数据，pyecharts需要的格式为：数据格式为[(key1, value1), (key2, value2)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[('home_page', 90400), ('search_page', 45200), ('payment_page', 6030), ('confirmation_page', 452)]\n"
     ]
    }
   ],
   "source": [
    "datas = []\n",
    "for column in [\"home_page\", \"search_page\", \"payment_page\", \"confirmation_page\"]:\n",
    "    user_count = df_merge[column].dropna().size\n",
    "    datas.append((column, user_count))\n",
    "print(datas)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('home_page', 100.0),\n",
       " ('search_page', 50.0),\n",
       " ('payment_page', 6.67),\n",
       " ('confirmation_page', 0.5)]"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 方便查看对比，进行归一化\n",
    "max_count = datas[0][1]\n",
    "datas_norm = [\n",
    "              (x, round(y*100/max_count, 2)) \n",
    "              for x,y in datas\n",
    "             ]\n",
    "datas_norm"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4. 绘制漏斗图"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pyecharts import options as opts\n",
    "from pyecharts.charts import Funnel"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pyecharts.charts.basic_charts.funnel.Funnel at 0x212409cd188>"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "funnel = Funnel()\n",
    "funnel.add(\"用户比例\", datas_norm)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "<script>\n",
       "    require.config({\n",
       "        paths: {\n",
       "            'echarts':'https://assets.pyecharts.org/assets/echarts.min'\n",
       "        }\n",
       "    });\n",
       "</script>\n",
       "\n",
       "        <div id=\"63f15ab2c2c64f8daa6c6ffbe26640af\" style=\"width:900px; height:500px;\"></div>\n",
       "\n",
       "<script>\n",
       "        require(['echarts'], function(echarts) {\n",
       "                var chart_63f15ab2c2c64f8daa6c6ffbe26640af = echarts.init(\n",
       "                    document.getElementById('63f15ab2c2c64f8daa6c6ffbe26640af'), 'white', {renderer: 'canvas'});\n",
       "                var option_63f15ab2c2c64f8daa6c6ffbe26640af = {\n",
       "    \"animation\": true,\n",
       "    \"animationThreshold\": 2000,\n",
       "    \"animationDuration\": 1000,\n",
       "    \"animationEasing\": \"cubicOut\",\n",
       "    \"animationDelay\": 0,\n",
       "    \"animationDurationUpdate\": 300,\n",
       "    \"animationEasingUpdate\": \"cubicOut\",\n",
       "    \"animationDelayUpdate\": 0,\n",
       "    \"color\": [\n",
       "        \"#c23531\",\n",
       "        \"#2f4554\",\n",
       "        \"#61a0a8\",\n",
       "        \"#d48265\",\n",
       "        \"#749f83\",\n",
       "        \"#ca8622\",\n",
       "        \"#bda29a\",\n",
       "        \"#6e7074\",\n",
       "        \"#546570\",\n",
       "        \"#c4ccd3\",\n",
       "        \"#f05b72\",\n",
       "        \"#ef5b9c\",\n",
       "        \"#f47920\",\n",
       "        \"#905a3d\",\n",
       "        \"#fab27b\",\n",
       "        \"#2a5caa\",\n",
       "        \"#444693\",\n",
       "        \"#726930\",\n",
       "        \"#b2d235\",\n",
       "        \"#6d8346\",\n",
       "        \"#ac6767\",\n",
       "        \"#1d953f\",\n",
       "        \"#6950a1\",\n",
       "        \"#918597\"\n",
       "    ],\n",
       "    \"series\": [\n",
       "        {\n",
       "            \"type\": \"funnel\",\n",
       "            \"name\": \"\\u7528\\u6237\\u6bd4\\u4f8b\",\n",
       "            \"data\": [\n",
       "                {\n",
       "                    \"name\": \"home_page\",\n",
       "                    \"value\": 100.0\n",
       "                },\n",
       "                {\n",
       "                    \"name\": \"search_page\",\n",
       "                    \"value\": 50.0\n",
       "                },\n",
       "                {\n",
       "                    \"name\": \"payment_page\",\n",
       "                    \"value\": 6.67\n",
       "                },\n",
       "                {\n",
       "                    \"name\": \"confirmation_page\",\n",
       "                    \"value\": 0.5\n",
       "                }\n",
       "            ],\n",
       "            \"sort\": \"descending\",\n",
       "            \"gap\": 0,\n",
       "            \"label\": {\n",
       "                \"show\": true,\n",
       "                \"position\": \"top\",\n",
       "                \"margin\": 8\n",
       "            }\n",
       "        }\n",
       "    ],\n",
       "    \"legend\": [\n",
       "        {\n",
       "            \"data\": [\n",
       "                \"home_page\",\n",
       "                \"payment_page\",\n",
       "                \"confirmation_page\",\n",
       "                \"search_page\"\n",
       "            ],\n",
       "            \"selected\": {\n",
       "                \"home_page\": true,\n",
       "                \"search_page\": true,\n",
       "                \"payment_page\": true,\n",
       "                \"confirmation_page\": true\n",
       "            }\n",
       "        }\n",
       "    ],\n",
       "    \"tooltip\": {\n",
       "        \"show\": true,\n",
       "        \"trigger\": \"item\",\n",
       "        \"triggerOn\": \"mousemove|click\",\n",
       "        \"axisPointer\": {\n",
       "            \"type\": \"line\"\n",
       "        },\n",
       "        \"textStyle\": {\n",
       "            \"fontSize\": 14\n",
       "        },\n",
       "        \"borderWidth\": 0\n",
       "    }\n",
       "};\n",
       "                chart_63f15ab2c2c64f8daa6c6ffbe26640af.setOption(option_63f15ab2c2c64f8daa6c6ffbe26640af);\n",
       "        });\n",
       "    </script>\n"
      ],
      "text/plain": [
       "<pyecharts.render.display.HTML at 0x212407c5a08>"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "funnel.render_notebook()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
